Token Extraction
location
: the Token Extraction Site. Defines the location of the extracted token.header
: the name of the header to be extractedextractor
: a list of TokenExtractors to be executed in order
Token Extraction Site
request
request_cookie
response
Token Extractor
An extraction instruction that takes in a string, performs an extraction on it, and outputs a string. Meant to be chained together.
regex
: performs a given regex on a string and outputs the First capture group of first match, or entire first match if no capture groupjwt_decode
: extracts the payload of a jwt tokenjson_path
: extracts a value from a json body using the given JsonPathhash
: hashes the currently extracted value with SHA256. intended to be used at the end of a set of extraction instructionsmetadata
: works differently than the rest of the extractors. this will "fork" the extractor at the current extraction step and parse any amount of tokens you want from it, then insert them into thetoken_metadata
field of theMatchData
output. This is great if you have any additional metadata you want to extract from a token without actually setting that value as the token.
Examples:
generic example that extracts the first 4 letters of the "name" field in a jwt token
token_extractor:
location: request
header: jwt
extractor:
- jwt_decode
- json_path: $.name
- regex: \b[a-zA-Z]{4}\b
a similar example, except this one extracts multiple bits of metadata during the main tokens extraction
token_extractor:
location: request
header: jwt
extractor:
- jwt_decode
# extra metadata to extract after the jwt has been decoded but before the rest of the token is extracted
- metadata:
# extracts some digits
"a": !regex \d+
# this step wont actually produce anything since the token were operating over will fail multiple jwt_decodes
"b":
- jwt_decode
- jwt_decode
- jwt_decode
# extracts the contents of the "sub" key then hashes it
"c":
- json_path: $.sub
- hash
- json_path: $.name
- regex: \b[a-zA-Z]{4}\b
an example of what the output of the above policy might look like in the MatchData
output from leaksignal:
"token": "John",
"token_metadata": {
"c": "C775E7B757EDE630CD0AA1113BD102661AB38829CA52A6422AB782862F268646",
"a": "1234567890"
}
Body Extraction
Similar to token extraction, leaksignal also supports extracting metadata from specific fields in a request/response body. The way this works is simpler than Token Extraction, you just supply a mapping of keys and json paths and any value that matches that path during parsing will have its value placed in the resulting proto.
Example 1:
endpoints:
- request_extractors:
"ssn":
- "test.my_ssn3[*].my_ssn4"
- "test.my_ssn2"
"never": "aaa.aaa.aaa.aaa.aaa.hello"
response_extractors:
"ssn2": "test.my_ssn3[*].my_ssn4"
An example of what the output of the above policy might look like in the MatchData
output from leaksignal. Here blob_idx
refers to the index of the json body for cases where a single payload will contain multiple separate json objects:
"matches": {
"response": {
"body_metadata": [
{
"key": "ssn",
"value": "123-45-4892",
"blob_idx": 0
},
{
"key": "ssn",
"value": "123-45-4895",
"blob_idx": 0
}
]
}
}
Example 2:
While body extraction was originally designed to pull whole values from json fields, an alternative syntax exists that allows you to use a regex to extract sub-values from the selected fields like so:
endpoints:
response_extractors:
"ssn":
path: "test.my_ssn3[*].my_ssn4"
regex: "\\d{4}"
"never":
- "hello.aaa.aaa.aaa.aaa.aaa"
- "goodbye.aaa.aaa.aaa.aaa.aaa"
As you can see, the ssn
extractor uses the alternative syntax to extract the last 4 digits of a social security number from the selected field.
Body Extraction Notes
Body extractors like this exist to help give context to matches. Because of that, by default, values will only be extracted from blobs that contain matches. If you want to override this behavior, you can use the following alternative syntax to set the include_all
flag and retrieve metadata from every blob regardless of matches.
endpoints:
request_extractors:
extractors:
"ssn":
- "test.my_ssn3[*].my_ssn4"
- "test.my_ssn2"
"never": "aaa.aaa.aaa.aaa.aaa.hello"
include_all: true
response_extractors:
"ssn2": "test.my_ssn3[*].my_ssn4"
As you can see, several alternate syntaxes exist for body extraction depending on the desired behavior, and any combination of the formats can be used. The following are some examples of possible ways they can be combined:
response_extractors:
"test": "hello.aaa"
response_extractors:
"test":
- "hello.aaa"
- "goodbye.aaa"
response_extractors:
"test":
path: "hello.aaa"
regex: "\\d{4}"
request_extractors:
extractors:
"test":
- path: "hello.aaa"
regex: "\\d{4}"
- path: "goodbye.aaa"
regex: ".{4}"
include_all: true